1 Introduction

1.1 Input files:

  • (AACT) aact_studies.tsv
  • (AACT) aact_drugs.tsv
  • (AACT) aact_descriptions.tsv
  • (LeadMine) aact_drugs_leadmine.tsv
  • (PubChem) aact_drugs_smi_pubchem_cid.tsv
  • (PubChem) aact_drugs_smi_pubchem_cid2inchi.tsv
  • (ChEMBL) aact_drugs_inchi2chembl.tsv
  • (ChEMBL) aact_drugs_chembl_activity_pchembl.tsv
  • (ChEMBL) aact_drugs_chembl_target_component.tsv
  • (TCRD/Pharos) pharos_targets.tsv
  • (JensenLabTagger) aact_descriptions_tagger_matches.tsv
  • (JensenLabDictionary) diseases_entities.tsv

nct_id is the study ID.

## [1] "Wed Apr  3 15:13:13 2019"
library(readr)
library(data.table)
library(plotly, quietly=T)

2 Input studies and drugs

2.1 Studies

Read file of all studies in AACT.

## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"

2.2 Drugs

Read file of all drugs in AACT.

  • id is AACT ID.
  • Note that one study may involve multiple drugs.
  • At this point a “drug” is identified by a name.
## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"

2.3 Studies: Interventional drug studies only

Select only Interventional studies (study_type) associated with drugs (via nct_id).

## [1] "Interventional studies: 237892 (79.2%)"
## [1] "Interventional drug studies: 124421 ; unique NCT_IDs: 124421"
Drug studies and drugs, by phase
phase N_studies N_drugs
Early Phase 1 1574 2615
Phase 1 23603 48593
Phase 1/Phase 2 6663 13288
Phase 2 33910 68850
Phase 2/Phase 3 3305 6503
Phase 3 22988 49507
Phase 4 19593 36331
NA 12785 29390
Drugs (itv_ids), by study overall_status
overall_status N
Completed 145006
Recruiting 33973
Terminated 19618
Unknown status 18463
Active, not recruiting 13962
Not yet recruiting 8001
NA 7080
Withdrawn 6969
Enrolling by invitation 1060
Suspended 945

2.4 Drugs by study start_year

(To do: stack with study start_year.)

## Warning: Ignoring 1 observations

## Warning: Ignoring 1 observations

2.5 Drug-trials by Phase and Status

3 NextMove Leadmine NER

AACT drug names resolved to standard names and structures via SMILES. Now we can use cheminformatically rigorous counts for drugs as active pharmaceutical ingredients (APIs).

## [1] "Drug unique SMILES resolved by LeadMine: 4699 ; unique intervention IDs: 171741"
## [1] "Drugs (drug names) with resolved structure: 180555 / 197300 (91.5%)"

3.1 NER mentions by intervention ID.

## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"

3.2 NER mentions by trial (NCT ID).

## [1] "Mentions by study: 92966 / 99647 (93.3%)"

3.3 NER mentions by drug, i.e. name in AACT.

## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"

4 PUBCHEM:

4.1 Intervention IDs to CIDs from PubChem (via SMILES)

## [1] "PubChem SMILES2CID hits: 3960 / 4698 (84.3%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153876"

4.2 InChIKeys from PubChem (via CIDs)

## [1] "PubChem CIDs with InChIKeys: 3801"

5 CHEMBL:

5.1 ChEMBL molecule IDs, and properties (via InChIKeys)

## [1] "ChEMBL compounds mapped via InChIKeys: 3332"

5.2 ChEMBL activities for mapped compounds

Select only activities with pChembl values for confidence.

## [1] "ChEMBL activities: 124438"
## [1] "ChEMBL activities molecules: 2287 ; targets: 3832 ; documents: 16198"

5.3 ChEMBL targets (via activities)

## [1] "ChEMBL target proteins: 3157"

6 IDG/TCRD:

## [1] "ChEMBL target proteins mapped to TCRD (human): 1806"

6.1 Targets by organism (top 10):

## [1] "Organisms: 187"
Targets by organism (top 10)
organism N_targets
Homo sapiens 1806
Rattus norvegicus 529
Mus musculus 238
Bos taurus 98
Sus scrofa 36
Cavia porcellus 26
Escherichia coli K-12 19
Oryctolagus cuniculus 18
Escherichia coli 17
Mycobacterium tuberculosis 17

6.2 Human single-protein targets only.

## [1] "Human targets: 1806"
target_type N
SINGLE PROTEIN 1216
PROTEIN COMPLEX 247
PROTEIN FAMILY 210
PROTEIN COMPLEX GROUP 91
PROTEIN-PROTEIN INTERACTION 16
SELECTIVITY GROUP 14
CHIMERIC PROTEIN 12
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"

6.3 Targets by IDG Target Development Level (TDL):

## [1] "   Tchem:    733" "   Tclin:    341" "    Tbio:    140"
## [4] "   Tdark:      2"

7 Diseases NER with JensenLab Tagger

(id) is AACT primary key for detailed_descriptions table. For disease entities, serialno corresponds with DOID.

Top 20 diseases by total mentions
doid N_mentions terms
DOID:4 76402 DISEASE;Disease;dis- ease;dis-ease;disease
DOID:0111161 73734 CAN;CaN;Can;can
DOID:162 28596 CANCER;CANcer;Cancer;Malignant Tumor;Malignant neoplasm;Malignant tumor;Primary Cancer;Primary cancer;cancer;malignant Tumor;malignant neoplasm;malignant tumor;primary cancer
DOID:9351 17274 DIABETES;DIABETES MELLITUS;DIAbetes;DIabetes;Diabetes;Diabetes Mellitus;Diabetes mellitus;diabetes;diabetes Mellitus;diabetes mellitus;diabetes-mellitus
DOID:6713 16632 CVA;Cerebrovascular Accident;Cerebrovascular Disease;Cerebrovascular accident;Cerebrovascular disease;STROKE;STRokE;Stroke;cerebro- vascular disease;cerebro-vascular disease;cerebrovascular accident;cerebrovascular disease;cerebrovascular disorder;cerebrovascular syndrome;cv-a;cva;stroKe;stroke
DOID:2030 12084 ANXIETY;Anxiety;Anxiety Disorder;Anxiety state;anxiety;anxiety disorder;anxiety state;anxiety syndrome;anxiety-state
DOID:1612 10583 BREAST CANCER;BReast CAncer;BReast Cancer;Breast Cancer;Breast cancer;Breast tumor;Breast-cancer;Primary breast cancer;breast Cancer;breast caNcEr;breast cancer;breast tumor;breast-cancer;breastcancer;mammary cancer;mammary tumor;primary breast cancer
DOID:2841 10021 ASTHMA;Asthma;BHR;Bronchial hyper-reactivity;Bronchial hyperreactivity;EIA;Exercise-induced asthma;asthma;bronchial hyper reactivity;bronchial hyper-reactivity;bronchial hyperreactivity;exercise induced asthma;exercise-induced asthma
DOID:3083 9782 CHRONIC OBSTRUCTIVE PULMONARY DISEASE;COLD;COPD;COPd;Chronic Obstructive Lung Disease;Chronic Obstructive Lung disease;Chronic Obstructive Pulmonary Disease;Chronic Obstructive Pulmonary disease;Chronic Obstructive lung Disease;Chronic Obstructive pulmonary Disease;Chronic Obstructive pulmonary disease;Chronic obstructive airway disease;Chronic obstructive lung disease;Chronic obstructive pulmonary disease;Cold;chronic Obstructive Lung Disease;chronic obstructive airway disease;chronic obstructive lung disease;chronic obstructive pulmonary disease;chronic obstructive pulmonary disorder;cold;copd
DOID:9970 9303 OBESITY;OBesity;Obesity;obEsity;obe-sity;obesity
DOID:10763 9144 HBP;HTN;HYPERTENSION;High Blood Pressure;High blood pressure;High-blood pressure;Hypertension;Hypertensive disease;high blood Pressure;high blood pressure;high blood-pressure;htn;hyper-tension;hypertension;hypertensive disease;hypertensive disorder
DOID:3393 6816 C-HD;CAD;CHD;CORONARY ARTERY DISEASE;CORONARY SYNDROME;CORONARY syndrome;ChD;Coronary ARtery DIsease;Coronary Artery Disease;Coronary Disease;Coronary Heart Disease;Coronary Heart disease;Coronary Syndrome;Coronary artery disease;Coronary disease;Coronary heart disease;Coronary-artery-disease;coronary Syndrome;coronary arteriosclerosis;coronary artery dis-ease;coronary artery disease;coronary disease;coronary heart disease;coronary syndrome;coronary-artery disease;coronary-artery-disease
DOID:0060145 6115 ANALGESIA;Analgesia;analgeSia;analgesia
DOID:0111084 5958 FACE;FaCE;Face;face
DOID:9352 5848 Diabetes Mellitus Type 2;Diabetes Mellitus Type II;Diabetes Mellitus type 2;Diabetes Mellitus, Type II;Diabetes mellitus Type 2;Diabetes mellitus non-insulin-dependent;Diabetes mellitus type 2;Diabetes mellitus type II;NIDDM;Non-Insulin Dependent Diabetes Mellitus;Non-Insulin-Dependent-Diabetes Mellitus;Non-insulin dependent diabetes mellitus;Non-insulin-dependent Diabetes Mellitus;Type 2 - Diabetes Mellitus;Type 2 Diabetes;Type 2 Diabetes Mellitus;Type 2 Diabetes mellitus;Type 2 diabetes;Type 2 diabetes mellitus;Type 2-diabetes mellitus;Type II Diabetes;Type II Diabetes Mellitus;Type II Diabetes mellitus;Type II diabetes;Type II diabetes mellitus;Type-2 Diabetes;Type-2 Diabetes Mellitus;Type-2 diabetes;Type-2 diabetes mellitus;Type-2-diabetes;Type-II diabetes;Type2 Diabetes Mellitus;Type2 diabetes;Type2 diabetes mellitus;diabetes mellitus type 2;diabetes mellitus type II;diabetes mellitus type-2;diabetes mellitus type2;diabetes mellitus, type 2;maturity onset diabetes;maturity-onset diabetes;non insulin dependent diabetes mellitus;non insulin-dependent diabetes mellitus;non-insulin dependent diabetes mellitus;non-insulin-dependent diabetes mellitus;noninsulin-dependent diabetes mellitus;type -2 diabetes mellitus;type 2 Diabetes;type 2 Diabetes Mellitus;type 2 diabetes;type 2 diabetes mellitus;type 2-diabetes;type 2diabetes;type 2diabetes mellitus;type II Diabetes;type II Diabetes Mellitus;type II diabetes;type II diabetes mellitus;type II-diabetes;type-2 Diabetes;type-2 diabetes;type-2 diabetes mellitus;type-2-diabetes;type-II diabetes;type-II diabetes mellitus;type-II- diabetes mellitus;type2 diabetes;type2 diabetes mellitus
DOID:10283 5056 Familial Prostate Cancer;HPC;PRostate Cancer;Prostate CAncer;Prostate Cancer;Prostate cancer;Prostatic cancer;hereditary prostate cancer;prostate Cancer;prostate cancer;prostate-cancer;prostatic cancer
DOID:8469 4985 FLU;Flu;Influenza;flu;influenza
DOID:225 4962 SYNDROME;Syndrome;syn drome;syndrome
DOID:3908 4959 NSCLC;Non Small Cell Lung Cancer;Non Small Cell Lung Carcinoma;Non Small Cell Lung cancer;Non small cell lung cancer;Non small-cell lung cancer;Non- small cell lung cancer;Non-Small Cell Lung Cancer;Non-Small Cell Lung Carcinoma;Non-Small Cell Lung cancer;Non-Small cell lung cancer;Non-Small- Cell Lung Cancer;Non-Small-Cell Lung Cancer;Non-Small-Cell lung Cancer;Non-small Cell Lung Cancer;Non-small Cell Lung Carcinoma;Non-small cell Lung Cancer;Non-small cell lung cancer;Non-small cell lung carcinoma;Non-small-cell Lung Cancer;Non-small-cell lung cancer;nSCLC;non small cell lung cancer;non small cell lung carcinoma;non small-cell lung cancer;non- small cell lung cancer;non-small Cell Lung Cancer;non-small cell Lung cancer;non-small cell lung Cancer;non-small cell lung cancer;non-small cell lung carcinoma;non-small-cell lung cancer;non-small-cell lung carcinoma;non-small-cell lung-cancer;nonsmall cell lung cancer;nonsmall cell lung cancer;nonsmall- cell lung cancer
DOID:784 4841 CKD;CKF;CRD;CRF;Chronic Kidney Disease;Chronic Kidney disease;Chronic Kidney failure;Chronic Renal Disease;Chronic kidney disease;Chronic kidney failure;Chronic renal disease;chronic Kidney disease;chronic kidney disease;chronic kidney failure;chronic renal disease;chronic renal failure syndrome;ckd;crf;renal failure chronic